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ABSTRACT 

In  seeking  to  solve  an  unconstrained  minimization  problem,  one  often 
computes  steps  based  on  a  quadratic  approximation  q  to  the  objective  func¬ 
tion.  A  reasonable  way  to  choose  such  steps  is  by  minimizing  q  constrained 
to  a  neighborhood  of  the  current  iterate.  This  paper  considers  ellipsoidal 
neighborhoods  and  presents  a  new  way  to  handle  certain  computational  de¬ 
tails  when  the  Hessian  of  q  is  indefinite,  paying  particular  attention 
to  a  special  case  which  may  then  arise.  The  proposed  step  computing  algo¬ 
rithm  provides  an  attractive  way  to  deal  with  negative  curvature.  Imple¬ 
mentations  of  this  algorithm  have  proved  very  satisfactory  in  the  nonlinear 
least-squares  solver  NL2SOL. 
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SIGNIFICANCE  AND  EXPLANATION 

Unconstrained  minimization  problems  arise  in  many  contexts.  Thus, 

* 

given  an  objective  function  $  :  !Rn  ->  IR  ,  one  often  must  seek  a  point  x 
which  minimizes  ^  (x)  .  It  is  usually  necessary  to  resort  to  some  itera¬ 
tive  procedure:  one  computes  a  sequence  of  iterates  x^ ,  x^,  • • •  which, 
if  all  goes  well,  converges  to  x*.  Given  the  current  iterate  x  =  x^, 

one  commonly  uses  a  quadratic  approximation  q  to  ^  in  computing  the  new 
iterate  x+  =  x^,'-  Since  q  may  only  be  accurate  on  a  neighborhood 

N  of  x,  one  appealing  way  to  compute  x+  is  by  minimizing  q  on  a 
specified  N.  This  paper  deals  with  ellipsoidal  neighborhoods  N  and 
presents  a  new  way  to  handle  certain  computational  details  when  the  Hessian 
of  q  is  indefinite  (i.e.,  not  positive  definite).  The  resulting  algorithm 
provides  an  attractive  way  to  compute  x+  even  when  q  is  not  convex.  Modules 
implementing  this  algorithm  give  very  satisfactory  performance  in  the  non-linear 
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1 •  I ntroduct ion 


Many  unconstrained  minimization  algori  ;hms  employ  a  sequence  of  quadratic 

approximations  <P^  :  3R n  IR  ,  i  =  0,1,2,...  to  the  objective  function 

f*  :  IRn  ->  ]R.  Using  them,  they  determine  a  sequence  xQ,  x^,  x2,...  of  points 

which  (usually)  are  ever  better  approximations  to  a  local  minimizer  x*  of  <f* . 

Given  the  current  iterate  x  =  x^  and  quadratic  model  >fi  =  <P^,  these  algorithms 

usually  compute  the  next  iterate  x+  =  x^+^  in  one  of  two  ways.  Either  they 

determine  a  Newton  step  s  such  that  ■ p(x  +  x)  is  minimized,  then  set 

★  * 

x+  =  x  +  Xs,  where  X  >  0  is  chosen  so  that  ip  (x  +  Xs)  <  <p  (x)  ,  or  they 

choose  a  neighborhood  N  =  N^  of  0  and  a  point  s*eN  which  minimizes  ip  (x  +  s) 

★ 

over  seN,  and  they  •  set  x+  =  x  +  s  ,  having  taken  care  in  choosing  N  that 
i P*(x  +  s*)  <  ifi* (x) .  Algorithms  of  the  latter  sort  have  an  intuitive  appeal: 
i P  often  approximates  >p*  well  only  in  a  neighborhood  of  x,  and  these  algo¬ 
rithms  attempt  to  achieve  the  maximum  function  reduction  possible  on  an  edu¬ 
cated  guess  at  such  a  neighborhood.  This  paper  concerns  itself  with  choosing 
s*  in  this  sort  of  algorithm,  given  x  and  an  N  of  the  reasonable  form 

* 

described  below.  Since  we  can  expect  the  quadratic  approximation  <p  to  tp 

only  to  be  accurate  in  a  neighborhood  x  +  N  of  the  current  iterate  x,  we 

shall  refer  to  a  point  s*  that  minimizes  ip  (x  +  s)  subject  to  seN  as  an 

otpimal  locally  constrained  (OLC)  step. 

Suppose  now  that  we  have  geJR n  and  a  symmetric  nxn  matrix  HeIRnxn 
*  TIT 

such  that  tp  (x  +  s)  =  <p  (x)  +  g  s  +  —  s  Hs.  (We  regard  vectors  as  column 
vector  and  use  superscript  T  for  "transpose".  Superscript  -T  means 
"inverse  transpose".  Of  course,  we  have  in  mind  that  g  =  Vip*(x)  and 
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H  =  v  (x)  in  a  suitable  sense.  We  lose  no  generality  in  assuming  x  =  0 

* 

and  *  (x)  =  0,  so  that 

(1.1)  V>(s)  =  gTs  +  j  sTHs. 

For  the  rest  of  this  paper  we  assume  that  N  has  the  form 

(1.2)  N  =  {y  ElR n  :  IJ  Dy  II  <  5}  , 

where  DelRnXn  is  nonsingular,  11*11  denotes  the  Euclidean  norm  IMI 
t  T  1/2 

(i.e.,  Ilyll  =  (y  y)  ),  and  6  >  0.  Some  of  the  algebra  below  is  simpler 

if  D  is  the  identity  matrix  I.  It  is  possible,  in  effect,  to  arrange  this 
by  a  change  of  variables.  Let 

(1.3a)  g  =  D_Tg 

(1.3b)  H  =  d"THD_1, 

(1.3c)  H  =  {yERn  :  ||y||  <  6}  , 

and 

*>-  —  2. 

C  3<3)  ifi  (s)  =  g  s  +  —  s  Hs. 

If 

(l.-,e)  s  =  Ds, 


then  (s)  =  >(i  (s) 

and 

s£N  if  and  only  if 

SE  N ,  so 

in 

proofs  we 

may  just  as 

well  deal  with  ^ 

and 

N,  for  which  D  =  I, 

as  with 

* 

and  N 

Therefore  we 

uosume  without  loss  of  generality  in  the  proofs  below  that  D  =  I.  It  is 
often  useful  in  practice  to  choose  D  ^  I,  e.g.  to  reflect  scale  in  the 


components  of  s,  so  we  leave  D  in  the  statements  of  some  results. 

Some  of  the  proofs  below  are  simpler  if  H  is  a  diagonal  matrix. 

We  may  assume  this  without  losing  generality,  because  we  can  also  arrange  for 

it,  in  effect,  by  changing  variables.  Specifically,  since  H  is  symmetric, 

nXn  %  ~  T 

there  exists  an  orthogonal  matrix  Ve]R  such  that  H  =  VHV  is  diagonal 

as  ~  as  ss  as'pfe  1 

with  nonincreasing  diagonal  elements.  Thus  if  g  =  Vg,  <Ms)  =  g  s  +  s  Hs  , 

and  s  =  Vs,  then  <p(s)  =  ^(s)  and  seN  if  and  only  if  seN. 

In  many  applications  H  is  positive  definite.  In  others,  however,  H 

may  have  one  or  more  nonpositive  eigenvalues.  This  may  happen,  for  instance, 

when  H  comes  from  the  "augmented  model"  in  the  NL2S0L  algorithm  [DenGW79] . 

We  therefore  consider  the  general  case  where  H  may  be  indefinite.  The  scheme 

we  are  discussing  thus  provides  an  appealing  way  to  deal  with  negative  curvature 

In  the  next  section  we  show  that  an  OLC  step  s*  satisfies 

(H  +  OD  D)s  =  -g  for  some  oc  >  0  such  that  H  +  «D  D  is  positive  semidefinite 

Our  treatment  differs  from  that  of  Goldfeld,  Quandt,  and  Trotter  [G0IQT66]  in 

T 

that  we  consider  the  special  case  where  H  +  °D  D  is  (nearly)  singular.  This 
case  requires  special  handling,  which  we  discuss  in  §  3.  In  §  4  we  give  a  com¬ 
plete  algorithm  for  computing  s* ,  and  we  discuss  numerical  experience  with  it 
in  the  context  of  NL2S0L  in  §5. 

The  present  work  builds  on  that  of  many  others.  Among  the  first  papers 

to  consider  computing  an  OLC  step  was  that  of  Marquardt  [Mar63] ,  in  which  <fi  * 

was  a  nonlinear  sum  of  squares.  (The  paper  of  Levenberg  [Lev44]  that  is  often 

mentioned  in  the  same  breath  actually  considers  a  somewhat  different  step,  one 

2 

that  minimizes  ^(x  +  s)  +  ||ds||  .)  Goldfeld,  Quandt,  and  Trotter  considered 

*  T 

a  qeneral  ^  but  restricted  themselves  to  the  case  where  H  +  a  D  D  is  posi¬ 
tive  definite.  Hebden  [Heb73]  gave  an  interesting  algorithm  that  computes  a 


*  T 

good  approximation  to  s  when  H  +  ao  D  is  sufficiently  positive  definite  and 

* 

that  otherwise  computes  what  often  would  be  a  reasonable  step.  More  [Mor78] 

refined  Hebden's  scheme  and  specialized  it  to  a  good  algorithm  for  the  case 

where  is  a  sum  of  squares.  The  algorithm  we  give  in  §4  incorporate  More's 

refinements  of  Hebden's  scheme  along  with  a  new,  often  more  reasonable  way  to 

T 

handle  the  case  of  a  (nearly)  singular  H  +  «D  D.  When  specialized  to  least- 

A 

squares  problems,  the  new  algorithm  computes  the  same  step  as  does  More's,  but 
may  expend  less  work  in  the  special  case. 


2.  Characterizing  s* 

The  following  characterization  of  an  OLC  step  s*  lies  at  the  heart  of  the 
algorithm  in  §4  for  approximating  s*. 

Theorem  2.1  If  $  and  N  are  given  by  (1.1)  and  (1.2),  then  s*eN  minimizes 

T 

<t( s)  over  s£N  if  and  only  if  there  exists  a*  >  0  such  that  H  +  «*D  D  is 
positive  semidefinite  and 

(2.1)  (H  +  a*DTD)s*  =  -g, 

with  ||  Ds*||  =  8  if  a*  >  0.  If  g  ^  0  or  H  has  a  negative  eigenvalue,  then 
a*  is  unique . 

Proof :  Without  loss  of  generality,  D  =  I,  (2.1)  has  the  form 

(2.2)  (H  +  or*l)  s*  =  -g, 

and  H  =  diag(h, , . . . ,h  ) ,  with  h,  >  h_  >  . . •>  h  . 

in  12  n 

T 

If  g  =  g  , ...,g  )  has  g  =0  for  i  <  n,  then  the  theorem  is  easily 
In  l 

seen  to  hold,  so  assume  g^  t  0  for  at  least  one  i  <  n. 

(Only  if) :  Suppose  s*  minimizes  over  N.  By  the  second-order 
necessary  conditions  in  Theorem  4  of  (McC76]  there  exists  >  0  such  that 

(2.2)  holds  and  either  a*  =  0,||  s*||  <  5fand  H  is  positive  semidefinite,  or 

else  a*  >  0,  ||s*||  =  6,  and 

(2.3)  yT(H  +e*I)y  >  0  for  all  yeRn  with  yTs*  =  0. 

The  "only  if"  assertion  thus  holds  if  a*  =  0,  so  assume  «*  >  0.  In  this  case, 

(2.3)  implies  that  H  +  o*i  has  at  most  one  negative  eigenvalue,  so  h^ >  -a* 
for  i  <  n  and  we  must  show  that  h  >  -a*.  This  is  clearly  so  if  h  =  h  , 


For  a  a*  ,  define  s  (a)  e  ]Rn  by 


so  assume  h  <  h 

n  n-1 

0  if  h.+a=h.  +a*=0  with  i  <  n; 

a  1 

s.  (a)  =  -g./(h.  +  a)  if  i  <  n  and  h.  +  a  >  0; 

ill  l 

-sign (g  )  [fi2  -  \  (s  .  (  a)  )  2  ]1^2  if  i  =  n. 

n  D 

:<n 

If  +  a*  =0  with  i < n,  then  (2.2)  implies  g^  =  0,  so  it  is  reasonable  to  set 

s  (a*)  =  0  in  this  case.  Indeed,  since  s*  minimizes  on  N,  it  is  readily  verified 

that  s*  =  0  if  h^  +  a*  =  0  and  then  that  s*  =  s(a*)»  provided  that  signfg^) 

is  properly  chosen  when  g^  =  0.  Let  ^  (  a)  =  ( s  (  a) )  •  Then  \p(a*)  =  <p  (s*)  .  Now 

if  h  +  a*  were  negative,  then  there  would  clearly  exist  >  -  h  >  a*  such 
n  n 

that  (2.2)  held  with  s*  replaced  by  s#  =  s(a^).  But  ||  s#  ||  =  6  and  it  is 

easily  verified  that  (a)  <0  for  a*  <  a  <  a# »  so  s#  would  be  a  point  in 

N  with  (s%)  <  >fi(s*) .  This  inequality  (which  also  follows  from  Theorem  7.1 

of  [Gan78] ) would  contradict  the  choice  of  s*.  Thus  h  +  a*  must  be  nonnegative 

n 

and  H  +  a*I  is  positive  semidef inite . 

(If):  Since  N  is  compact,  there  exist  an  s*  and  a  *  of  the  sort 
jst  considered.  To  complete  the  proof,  we  study  the  extent  to  which  they  are 
det  rmined  by  (2.2).  Therefore  we  now  redefine  s:[0,oo)->  ]R  n  to  be  an  arbi¬ 
trary  function  such  that 


(2.4) 

(H  + 

al) s  (a) 

=  -g- 

Let  k  be 

such  that 

h ,  >  h 
l  n 

for  i  <  k 

and  h.  =  h 
l  n 

for  i  >  k,  and  for 

>s 

II 

> 

■  ,y  )T,  let 
n 

II 

(0, . . . ,0,y  , 

T 

. . - ,y  )  .  For 
n 

a  >  -h  , 
n 

-  (H  +  aT)  '''g  is  uniquely  determined  by  (2.4)  and  ||  s(a)  j|  decreases 

strictly  monotonically .  On  the  other  hand,  if  a  =  -  h  ,  then  (2.4)  can  hold 

n 


3.  (Near)  Singularity  in  H  +  g*DTp 

When  g  /  0  (as  we  henceforth  assume) ,  it  is  usually  possible  to  compute 
an  approximate  OLC  step  s  as 

(3.1)  s  =  s(a)  =  - (H  +  aDTD)  1g, 

T 

where  a  ^  0  is  chosen  so  that  H  +  aD  D  is  positive  definite  and  ||  Dsfct)|| 
is  near  $.  Such  a  step  s  exactly  minimizes  < p  (given  by  (1.1))  on  an  approxi- 
mation  to  the  "trust  region"  N  given  by  (1.2)  If  H  +  a*D  D  is  singular  in 

(2.1) ,  however,  then  it  may  be  impossible  to  compute  a  suitable  s  in  this  way, 

because  it  can  happen  that  ||  Ds  (a)  ||  is  considerably  less  than  6  for  all  a 

T  T 

that  make  H  +  aD  D  positive  definite.  And  if  H  +  *D  D  is  nearly  singular, 

then  computing  a  sufficiently  large  s(a)  may  be  impractical  or  at  least  unduly 

costly.  Of  course,,  we  could  simply  accept  a  "short"  s(a).  But  if  H  has  a 

significantly  negative  eigenvalue  and  is  a  good  approximation  to  the  current 

2  A 
true  Hessian  V  , p  (x)  ,  then  it  il>ay  prove  well  worthwhile  to  compute  a  step  s 

having  a  significant  component  in  the  direction  of  an  eigenvector  of  H  cor¬ 
responding  to  the  smallest  eigenvalue.  In  what  follows  we  describe  a  simple 
way  to  compute  such  a  step  s.  This  step  approximately  minimizes  >p  (to  within 
a  prescribed  tolerance)  on  the  exact  trust  region  N  of  (1.2) . 

To  simplify  the  notation,  we  assume  for  the  rest  of  this  section  that 

D  =  I  and  H  =  diag(h.  , . . .  ,h  )  with  h  >  h_>...>h  ,  as  in  the  proof  of 

in  1  2  n 

Theorem  2.1.  We  also  assume  h  <  0. 

n 

When  hfi  <  0,  the  algorithm  of  §4  maintains  a  lower  bound  rj  on  accept¬ 
able  values  of  a  such  that  n  <  -h  .  If  a  >  -h  yields  II  s (a)  II  <6  ,  then 

n  n  n  M 

(3.2a)  r?  <  -h  <ot*  <  a  • 

n 


It  will  be  convenient  to  let 


(3.2b) 


9  =  a  -  n  • 


We  may  regard  H  +  a*I  as  nearly  singular  if  we  encounter  an  a  >  -h^  for  which 
0  is  sufficiently  small  (in  the  sense  made  precise  in  Theorem  3.2  below)  and 
for  which  ]|  s  (  ot) )  ]  is  unacceptably  small,  say 


s  ( a)  <  36  , 


for  some  prescribed  3c  (0,1),  e.g.  3  =  0.75  or  3  =  0.9. 

Suppose  now  that  (3.3)  holds,  and  let  s  =  s(a).  For  h.  >  h^,  we  may 

* 

expect  that  s.  =  s.,  while  for  h.  =  h  ,  we  may  expect  g.  =  0.  Thus  it  seems 
ii  in  i 

reasonable  to  consider  computing  an  approximate  OLC  step  s  by  finding  an 

approximate  eigenvector  v  of  H  corresponding  to  h^  and  adding  a  suitable 

multiple  of  v  to  s.  Because  of  (3.2),  H  +  I  has  an  eigenvalue  h^  +  a  with 

O^h  +a^a-n=8  (and  eigenvectors  corresponding  to  this  eigenvalue  are 
n 

also  eigenvectors  of  H  corresponding  to  h  ) .  We  probably  have  a  factorization 

n 

n 

of  H  +  ol,  so  for  k  >  1  (e.g.  k  =  2) ,  it  should  be  easy  to  compute  v  e  JR 
by  the  inverse  power  method  such  that 


(3.4a) 


=  1  and 


(3.4b) 


(H  +  ol)  v  <  K  0  . 


Given  such  a  v,  we  may  compute 


(3.5a) 


S  =  S  +  CTV  , 


with  a  chosen  so  that  ||  s||  =  6  and  <P(s)  is  minimized.  Specifically; 

Lemma  3.1;  If  <p(  t)  =  <^  ( s  +  tv),  and  if 


(3.5b) 


(  6  -  ||  s  ||  2  )/(sTv  +  sign  (sTv)  /  (sTv)  +  6  -  |j  s||  ]  , 


-9- 


T  T 

(with  sign(s  v)  =  1  or  -1  if  s  v  =  0)  ,  then  x  =  a  minimizes  tp(  x)  subject 

to  the  constraint  ||  s  +  xv  ||  =  6. 

Proof :  There  are  two  values  of  T  for  which  ||  s  +  xv  J|  =  6  ,  namely 


(3.6a) 


x+  =  -sTv  +  /  (sTv)2  +  6  2  -  ||  s  ||  2 


(3.6b) 


T  /,  T  ,2  7 

x  =  -s  v  -  /  (s  v)  +  6 


Because  of  (1.1)  and  (3.1), 


T  T  1  2  T 

T)  =  </?(  s )  +  xgv+xsHv+  —  x  v  Hv 


T  T  1  2  T 

=  v>( s)  +  X  [g  +  (H  +  a  I)  s]  v  -  xas  v  +  —  x  v  Hv 

2 


T  1  2  T 

=  <^( s)  -  xas  v  +  —  x  v  Hv. 


Using  (3.6)  and  (3.4a) ,  we  find 


\p(T+)  -  ip(i_)  =  -(x+-  x_)  asTv  +  j  (x+-  x_)(x+  +x_)vThv 


=  -(x  -  x_) (sTv) (a  +  vTHv) 


=  -2  /  (sTv)2  +  62  -  ||  s||2  (sTv)vT(H  +a  I)v. 


But  H  +  al  is  positive  definite,  so  v  (H  +  al)v  >  0  and  l£>(X+)  >  ip(T  )  if 

T  T  T  /  «p  2  2  2' 

and  only  if  s  v  <  0.  Thus  the  choice  x  =  -s  v  +  sign(s  v)  /  (s  v)  +5  ~  ||  s|| 

:air. imizes  t p  subject  to  ||  s  +  xv  ||  =  6,  which  is  equivalent  to  x  -O  with  a 

given  by  (3.5b;.  ■ 

We  now  consider  how  to  tell  whether  the  8  of  (3.2)  is  small  enough  that 


-he  relative  difference  between  ( s * )  and  if  is)  is  small,  i.e.,  that 
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(3.7) 


etf(s)  <fi(s*)  -  fils)  <  0 


for  some  prescribed  ee(0,l),  e.g.  e  =  0.1.  To  this  end,  it  is  convenient  to 
define  s:  [a*,  oo)->]Rn  by 


(3.8) 


sa(t)  =  \ 


s^(t)  =  -g^/(fu  +  t)  if  i  <  n; 


sign(g  )  /6  -  J  s.(r)  if  i  =  n. 


3  <n 


In  the  event  that  h.  =  h  for  some  i  <  n,  we  assume  without  losing  generality 

x  n 

that  g.  =  s*  =  0  and  interpret  (3.8)  as  specifying  s. (a*)  =0.  If  g  =0, 
ii  in 

★ 

then  we  assume  that  sign(g  )  =  -sign(s  )  in  (3.8).  Thus,  in  all  cases  s  is 

n  n 

a  smooth  function  with  s(a*)  =  s*  and  ||  s  ||  =  6  . 

To  bound  <^(s)  -  ip(s*)  ,  we  shall  first  bound  filsla))  -  ip(s*)  ,  then  bound 
-  (fils  (a)).  To  bound  (fils)  -  (fils*),  it  is  convenient  to  define  4>  :[a*,oo)->  IR 
by 


(3.9) 


iMt)  :=  ^(s(t)  )  -  g  s  (x) 
n  n 


I 

i<n 


'gi 


higi 


h .  +  t 
1 


2(hi+r) 


2  J 


f  h 

2  n 


62  -  l 


2 

gi 


i<n  (hi  +t) 


2  J 


|h62- 


\  l  g2 (2t  +  h.  +  h  ) / (h.  +t)' 
2  l  l  n  l 

3<n 


Note  that  ip  ’  (  t)  =  £  g2  (h  +t)/(h.  +  t)3  <  £  s*2  (h  +t)/(h^  +  x)<  £  s*2  < 

i<n  i<n  i<n 

Since  a  ~  a*  by  (3.2),  we  thus  find 
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•y 

(3.10)  iMa)  -  iM <x* )  <  06  • 

Now  (3.2)  and  (3.3)  imply 

(3.11)  |  g  |  <  (h  +  a)  ||  s  ||  <  966, 

while  (3.8)  implies  |sn(a*)  -  (ot )  |  <  6  .  Together  with  (3.9)  and  (3.10),  these 
yield 

(3.12)  V>(s(b))  -  *(s*)  =  iMa)  -  iMa*)  -  "  ®n(a)1 

«t  (1  +6)062. 

Now  we  deduce  a  bound  on  i^(s)  -  *i(s(a)).  For  brevity,  denote  s(a)  by  s  • 
Then  (1.1)  gives 

»  -  T  -  1  »T  a  1-T- 

(3.13)  ^(s)  -itfs(a))  =  g  (s  -  s)  +  -  s  Hs  -  j  s  Hs  . 

*T  - 

To  derive  a  convenient  expression  for  s  Hs,  it  is  useful  to  let 


(3.14)  f  =  (H  +  ctl )  v , 

so  that  Hv  =  f  -  av.  Note  from  (3.4b)  that 

(3.15)  ||  f  ||  <  *9  . 


Since  s  =  s  +  av  and 


6 ,  we  have 


c2  + 


_  T 
2as  v 


=  62  - 


and 


sHs 


T  2  T 

s  Hs  +  a  vHv  +  2as  Hv 


T  2  T 

s  Hs  +  a  (v  f 


T  T 

a)  +  2o  (s  f  -  as  v) 


=  sTHs  -  a[62  -  ||  s ||  2  J  +  o  [av  +  2s]  f. 

-  T 

Similarly,  we  have  s  =  s  +  oe^,  where  e^  =  (0,...,0,1)  is  the  n-th  standard 
unit  vector  of  Rn  and  a  is  chosen  so  that  ||  s||  =  6  (with  signfs^)  =  signls^)), 
and  we  find 
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(3.10) 


<Kot)  -  « Ma*)  <  962. 


Now  (3.2)  and  (3.3)  imply 


(3.11)  |  gn  |  <  (h  +  a)  ||  s  ||  <  066, 

while  (3.8)  implies  |s  (a*)  -  s  (a )  I  <  6  .  Together  with  (3.9)  and  (3.10),  these 

n  n  1 

yield 

(3.12)  *(s  «*))  -  *< s*)  =  t|i(a)  -  4*( a*)  -  g  [s  (<»*>  -  5  (a)l 

n  n  n 

<  (1  +6)062. 

Now  we  deduce  a  bound  on  ^(s)  -  ^(s(a)).  For  brevity,  denote  s(a)  by  s  . 
Then  (1.1)  gives 

(3.13)  <fi( s)  -  *f(s  (a) )  =  gT(s  -  s)  +  j  sTHs  -  j  sTHs  . 

/.T  » 

To  derive  a  convenient  expression  for  s  Hs,  it  is  useful  to  let 

(3.14)  f  =  (H  ,+  <xl)v, 

so  that  Hv  =  f  -  av.  Note  from  (3.4b)  that 


(3.15) 


f  <  tee  . 


Since  s  =  s  +  ov  and  ||  s||  =  6,  we  have  a2  +  2asTv  =  62  -  ||  s  || 2  and 


sHs  =  sTHs  +  a2vHv  +  2asTHv  *  sTHs  +  a2 (vTf  -  a)  +  2o(sTf  -  asTv) 


*  sTHs  -  a[62  -  ||  s ||  2  ]  +  o  [ov  +  2s]  Tf. 

-  T 

Similarly,  we  have  s  =  s  +  oe  ,  where  e  =  (0,...,0,1)  is  the  n-th  standard 

n  n 

unit  vector  of  lRn  and  a  is  chosen  so  that  ||  s II  =  6  (with  sign(s  )  =  sign(s  )), 

M  u  n  n 

and  we  find 
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s^Hs  =  sTHs  +  h  [6  2  -  II  s  ||2  ]  . 

n  M  1 


Together  with  (3.13),  these  equations  imply 

*>(s)  -  *>(s(a) )  =  gT(s  -  s)  -  j  (a  +  hfi)  (6 2  -  j|  s  J|2  )  +  j  o(av  +  2s)Tf 

T  ~  1  T 

<  g  (av  -  ae  )  +  —  a(ov  +  2s)  f  . 

n  z 

T  T  T 

Now  (3.1)  and  (3.14)  give  gv=-s  (H+  al)v=-sf,  and  the  definitions  of  a 

and  a  imply  |a|  <  6  and  |o|  <  6,  so  (3.3),  (3.4a),  (3.11),  and  (3.15) 

„  2  2  1 

combine  with  the  above  inequality  to  give  <fi  (s)  -  ^(s(a))  <  66  <0  +  ©66  +  J  6(6+2B6)<© 

<  (B(2k  +  1)  +  j  k]©62  . 

Combining  this  with  (3.12)  and  Theorem  2.1,  we  finally  obtain 

I 

| 

(3.16)  0  <*>(s)  -  *>( s*)<  [2g(ic  +  1)  +  \  <  +  1]©62. 

This  leads  to 

Theorem  3.2:  Let  (0,1)/  e  e(0,l),  and  (l,oo)  be  given  and  suppose  g  ^  0 

and  that  a  ^  0  renders  H  +  al  positive  definite.  If  the  s  of  (3.1)  satisfies 
(3.3)  and  the  6  of  (3.2)  satisfies 

(3.17)  ©  <  «  6~2  ( a  ||  s  || 2  -  gTs)/[4B(K  +  1)  +  k  +  2]  , 

and  if  s  is  determined  by  (3.5),  where  v  satisfies  (3.4),  then  (3.7)  holds. 

Proof :  From  (1.1)  and  (3.1)  we  have 

T  1  T  1  n  1 1 2 

*>(s)  =  g  s  +  j  s  (H  +  ai)s  -  —  o||  s  || 

=  gTs  -  \  gTs  -  j  a  ||  s  ||2 

=  j  (g  s  -  a  ||  s  ||2  ) . 
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4 .  Choosing  a 


If  either  H  is  indefinite  or  the  Newton  step  =  -H  ^g  is  too  large, 

then  computing  an  OLC  step  s*  requires  finding  a  solution  a*  to  the  scalar 
nonlinear  equation  ||  D(H  +  ciDTD)  1g||  =  6  .  There  are  many  iterations  for  approx¬ 
imating  such  an  a*  — see  Gander's  excellent  discussion  in  §6  of  [Gan78] .  In 
view  of  the  fast  convergence  reported  by  More  [Mor78] ,  we  prefer  to  use  an  itera¬ 
tion  proposed  by  Reinsch  [Rei71]  and  independently  by  Hebden  [Heb73] ,  together 
with  More's  (modification  of  Hebden's)  safeguarding  scheme.  Let 


iji(a)  :  =  ||  D(H  +  aDiD)  g  ||  -  6 


TnN_l  II-1  _  fT1 


The  basic  iteration  is  Newton’s  method  applied  to  <(/.  Thus  if  iterate  renders 
T 

H  +  ct^D  D  positive  definite  but  yields  an  unacceptable  step,  then  we  compute  a 
tentative  value  (o^)  for  a  ,  i.e. 


(4.2a) 


\+l=ak+  II  Dsll  2  01  Ds  ||  -  6  )/[<5sTDTD(H  +  aDTD)_1DTDs]  , 


where 


(4.2b) 


T  -1 

s  =  s(ak)  =  “  (H  +  a  j^D  D)  g  . 


We  also  maintain  lower  and  upper  bounds  and  u^  on  a  and,  if  H  is  not 

positive  definite,  a  value  nk  ^  0  such  that  is  an  upper  bound  on  the  smallest 

-T  -1  ... 

eigenvalue  of  D  HD  .  (For  convenience,  we  set  n  =  -1  if  H  is  positive 

definite.)  We  discuss  below  how  these  quantities  are  updated.  Once  and 

u,  ,  have  been  determined,  we  obtain  a  safeguarded  a.  from  the  rule 
k+1  k+1 


Vi  if  W^k+i*  Uk+1  and  5k+!  >  Vi  ; 


""3  1/2 

max  {  10  *  uk+1,  (^k+1uk+1)  }  otherwise. 


-15- 


Again  assume  variables  have  been  changed  so  that  D  =  I.  To  solve 

linear  systems  involving  H  +  a^I,  e.g.  to  compute  s(ak)  in  (3.1),  we  recom- 

T 

mend  attempting  to  compute  the  Cholesky  (or  LDL  )  decomposition  of  H  +  a^I 

(see  e.g.  §3.3  of  [Ste73]) .  If  this  works,  then  we  may  regard  H  +  a^I  as 

numerically  positive  definite  (provided  the  Cholesky  factor  has  no  zeros  on 

the  diagonal) .  Otherwise  for  some  1  between  1  and  n  we  may  express  the 

T 

leading  principal  l  x  i  submatrix  of  H  +  a^l  as  LML  ,  where  L  is  a 

lower  triangular  matrix  with  nonzero  diagonal  and  M  is  the  diagonal  matrix 

diag(l,  1,...,  1,  p) .  Both  L  and  p  are  readily  available  as  a  byproduct 

-T 

of  the  attempted  factorization,  and  p  <  0.  We  compute  z  =  L  e^  (i.e., 

solve  Lz  =  e^) ,  where  e^  =  (0,  0,...,  0,l,)Te]RS'.  Then  (§)  (H  +  a^I)  (8) 

=  z  U1L  z  =  e^Me^  =  p,  so  p/||  z||  is  a  Raleigh  quotient  for  H  +  a  I,  and 
(P/||  z||2  )  -  <.0  is  an  upper  bound  on  the  smallest  eigenvalue  of  H.  Hence 

we  set 

(4,4a)  Vi =  Vi =  “k " W/I1 2,12  and 

(4.4b)  Vi  =  \ 

and  choose  “k+1  =  ak  (which  will  force  a  safeguarded  choice  of  ak+1  in  (4.3)). 

If  H  +  a  i  is  positive  definite  then  the  concavity  of  ^  [Rei7l]  implies 
that  the  Newton  iterate  «k+1  given  by  (4.2)  satisfies  <*k+^  <  a*.  In  this  case 
we  recommend  using  the  following  variation  on  More’s  update  prescription  for 


«-k  and  \: 

if  iMo^) 

<  0 ,  then 

choose 

(4.5a) 

Vi 

=  “k+l 

and 

(4.5b) 

Vi 

=  Uk  * 
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Otherwise  choose 


(4.6a)  *k+1  =  maxUk,  aR+1}  and 

(4.6b)  Vi  =  V 

In  both  cases,  let  n,  =  n,  . 

k+1  k 

To  obtain  the  initial  bounds  £,  ^  and  u^,  we  obtain  lower  and  upper  bounds 
EMIN  and  EMAX  on  the  eigenvalues  of  H  from  the  Gerschgorin  circle  theorem  (optimized 
by  the  diagonal  scaling  technique  described  below) ,  and  we  exploit  the  following 
observation:  the  a*  of  Theorem  2.1  is  such  that  6  =  ||  g||  /(X  +  a*)  for  some 
A  between  the  smallest  and  largest  eigenvalues  of  H.  Thus 
(  ||  g||/6)  -  EMAX  <  a*  <  (  ||  g  |j  /6)  -  EMIN,  and  we  let 

(4.7a)  =  maxUQ,  ( ||  g||  /6  )  -  EMAX}  , 

(4.7b)  u  *  (||  g ||  /&)  -  EMIN  , 

where  l  is  given  by  whichever  of  (4.4a)  or  (4.5a)  applies,  with  a  ,  =  £  ,  =0. 

o  -1  -1 

To  compute  EMIN  and  EMAX,  we  use  a  special  case  of  the  scaling  by  diagonal 

matrices  considered  in  [Var65] .  Specifically,  we  find  a  diagonal  matrix  J9 

having  n-1  diagonal  entries  of  unity  and  one  other  positive  diagonal  entry  such 

that  the  Gerschgorin  lower  bound  on  the  spectrum  of  JDh£> 1  is  as  large  as  pos\i- 

2 

ble,  and  we  use  this  bound  as  EMIN.  This  is  quickly  done  (in  0(n  )  operations) 
as  follows:  Compute  the  off-diagonal  row  sums 


(4.8a) 


and  find  k  such  that  row  k  gives  the  minimum  Gerschgorin  lower  bound: 


(4.8b) 


V  "  °k  =  min{  . 
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For  j  ^  k,  let  6^  denote  the  auxilliary  quantity 


(4.8c) 


8.  =  (H.  ,  -  H.  .  +a.  -  H.u  )/2 
3  kk  3  1  3k 1 


and  compute 


(4.8d) 


EMIN  =  H  -  max  {8.  +  (02  +  o,  I H I ) 1/2  I  j  ?  k} 
kk  3  3  k'  3k1  1 


EMAX  is  computed  similarly:  with  ck  as  in  (4.8a),  find  k  such  that 


(4.9b) 


For  j  yt  k,  let 


\k  +  °k  =  max{  Hjj  +  aj  I  1  <  j  <  n} 


(4.9c) 


8.  =  (H.  .  -  H. .  +  a.  - 1  H.,  l)/2 
3  33  kk  3  I  3k ' 


and  compute 


( 4 . 9d ) 


EMAX  =  H  +  max {  9.  +  (02  +  a,  I  H  I  ) 1/2  I  j  *  k}  . 

kk  3  3  k1  3k1  '  J  1 


We  begin  the  quest  for  a  by  trying  a  =  0.  If  this  proves  unsatisfac¬ 
tory,  then  we  compute  and  by  (4.7-9).  If  acceptable  values  a(Prev) 

(prev) 

and  6  of  a  and  6  from  a  previously  computed  OLC  step  are  available, 

t1  “n  we  obtain  from  a  rule  which,  according  to  J.  E.  Dennis  [private  com¬ 

munication],  J.J.  More  has  found  helpful: 


(4.10) 


-  _  (prev)  (prev) 

-  0  a  /o  . 


(prev)  (prev)  — 

If  a  and  6  are  unavailable,  then  we  simply  set  =  0. 

•  .  .  R 

To  prevent  excessive  iterations,  we  deem  the  step  s  =  s  computed  from 


u  acceptable  if  66  <  ||s  ||  <  y6  for  some  specified  6c  (0,1)  and  yed*00)* 
(iiebden  [Heb73]  and  More  [Mor78]  choose  6  =  0.9  and  y  =  1.1.  In  connection 

with  an  algorithm  like  that  of  NL2SOL  [DenGW79] ,  where  6/ II  s*preV||  or  6/6*PreV* 


« 


either  equals  unity  or  two  or  lies  in  [0.1,  0.5],  Dennis  and  Schnabel  suggest 


5  =  0.75  and  y  =  1.5  [DenS79] .  Our  computational  experience  with  NL2SOL 
slightly  favors  the  former  choice.) 

In  practice,  D  is  usually  a  diagonal  matrix,  so  the  explicit  change  of 

variables  (1.3)  is  easily  performed,  and  we  recommend  actually  performing  it 

T  T 

when  H  is  given  explicitly.  When  H  has  the  form  J  J,  g  has  the  form  J  r, 

and  J  and  r  are  given  explicitly,  on  the  other  hand,  we  prefer  the  technique 

advocated  by  More  [Mor78l ,  i.e.,  using  a  OR  factorization  of  [  1/2  ]  to  com- 

L  a  D  J 

pute  s . 

The  statement  of  Algorithm  4.1  below  involves  fewer  tildas  if  the  notation 

of  (1.3)  is  reversed,  so  that  the  given  assignment  is  finding  s*  to  minimize 
~~~T~1~T~~ 

^(s):=gs+  —  s  Hs.  When  H  is  given  explicitly  and  g  /  0,  the  method  des¬ 

cribed  above  for  computing  a  reasonable  approximation  s  to  s  *  may  be  sum¬ 
marized  as  follows: 

Algorithm  4.1: 

-T  -  -1  -T~ 

Compute  H  =  D  HD  and  g  =  D  g  . 

If  H  is  positive  definite,  then: 

Compute  the  Newton  step  s^  =  -H  ^g. 

If  ||s(N)||  <  y6,  then  halt  and  return  s  =  D  ^s^. 

Set  n,  =  -1  and  determine  l  from  (4.5)  with  a  ,  =i  ,  =  0. 

1  o  -1-1 

Else  [H  not  positive  definite]  Compute  n  and  £  from  (4.4) 

o  o 

with  a  ,  =  £  ,  =  0  and  set  n,  =n  . 

-1  -1  1  o 

Compute  EMIN  and  EMAX  by  (4.8)  and  (4.9),  and  compute  and 

from  (4.7) . 

fprev)  (prev)  — 

If  6  and  6  are  available,  compute  from  (4.10);  other- 

let  =  0. 

Compute  x  from  (4.3). 


For  k 


1 >  2,  •  • . 


If  H  +  oijl  is  positive  definite,  then 

(k)  -1 

Set  nk+1  =  nk  and  compute  s  =  -  (H  +  a^I)  g. 

If  66  <  ||s  ^  ||  <  y6,  then  halt  and  return  s  =  D  . 

Compute  ak+i  from  (4.2)  with  D  :=  I. 

If  ||  s  ^  II  <  (56  ,  then 


If 


>  0  and 
(3.17)  with 


\  ‘  \ 


satisfies 


s  =  s(k),  then 


Compute  v  satisfying  (3.4). 

Compute  s  from  (3.5). 

Halt  and  return  s  =  D  ^s. 

Compute  l,  ,,  and  u,  ,  from  (4.6). 
k+1  k+1 

(U 

Else  [  ||  s  '||  >  y6  ]  compute  J^+1  and  uk+1  from  (4.5). 
Else  [H  +  akI  not  positive  definite]  set  a  =  ak 


and  compute  n,  ,  ,  i,  .  ,  u,  ,  from  (4.4). 
k+1  k+1  k+1 


Determine 


from  (4.3) . 


After  an  OLC  step  has  been  computed,  it  is  often  necessary  to  compute 
another  OLC  step  from  the  same  g  and  H  with  a  new  value  of  6*  To  handle 
this  situation,  it  is  worthwhile  to  modify  the  initial  part  of  Algorithm  4.1 


to  take  advantage  of  the  extra  information  that  is  available.  For  example. 


if  6  has  been  increased,  then  u^  can  be  set  to  the  minimum  of  the  value 
given  by  (4.7b)  and  the  last  value  u*old^  of  uk»  while  if  6  has  been 
decreased,  fcj  can  be  set  to  the  maximum  of  the  value  given  by  (4.7a)  and  the 
last  value  of  In  either  case,  can  be  set  to  r/°ld^  and 

a^  can  be  computed  from  (4.2)  with  aQ  m  . 
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5.  Numerical  Experience  with  NL2S0L 


For  use  in  NL2S0L  [DenGW79] ,  we  have  implemented  two  versions  of 

Algorithm  4.1s  GQTSTP  is  designed  for  use  with  a  general  H,  while  LMSTEP 

deals  explicitly  with  J  (or  its  QR  decomposition)  when  H  has  the  form 

T  T 

J  J,  g  has  the  form  J  r,  and  J  is  a  rectangular  matrix  with  at  least 
as  many  rows  as  columns.  Both  codes  make  special  provision  for  the  case 
in  which  an  OLC  step  is  to  be  recomputed  with  a  new  6  but  the  same  g 
and  H.  To  handle  D,  which  both  codes  assume  to  be  a  diagonal  matrix, 

GQTSTP  explicitly  changes  variables,  whereas  LMSTEP  follows  the  procedure 
recommended  by  More  [Mor78] . 

*  T 

The  way  these  codes  deal  with  (near)  singularity  rn  H  +  a  D  D 

deserves  some  further  discussion.  Both  detect  this  case  by  test  (3.17) . 

•  *  T 

In  the  case  of  LMSTEP  and  true  singularity  in  H  +  a  D  D,  we  would  have 

a*  =  0  and  H  positive  semidefinite,  and  any  v  in  the  null  space  of  H 

would  be  orthogonal  to  g,  whence  ^>(s  +  v)  =  <p(s)  .  LMSTEP  therefore  returns 

without  modification  a  step  s  for  which  (3.3)  and  (3.17)  hold.  GQTSTP 

similarly  avoids  replacing  s  by  s  =  s+  cfv  in  cases  where  ^(s)  and 

(s  )  would  not  differ  significantly.  Specifically,  if  *p( s)  -  <p(s)  <  -(e/3)^(s)» 

then  GQTSTP  returns  s  rather  than  s.  To  assure  that  (3.7)  holds  in  this 

case,  GQTSTP  uses  (3.17)  with  e  replaced  by  2e/3. 

LMSTEP  and  GQTSTP  depart  slightly  from  Algorithm  4.1  in  the  calcula- 

-tionof  ak+1  and  *k+1-  They  use  <*k+1  =  Sk+1  i**k+1<  Sk+1  <  uk+l 

in  place  of  the  first  part  of  (4.3),  ^k+1  =  ^k  in  Place  of  (4.6a),  and 

£  k+1  =ak  "  <a k. ) '  (ak^  in  Place  of  (4.5a),  where  'iji(a)  *  ||  D(H  +  aDTD  1)i||  -6  . 

This  gives  a  worse  bound  than  the  one  derived  from  the  ip  of  (4.1).  J.  J. 
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More  [private  communication]  pointed  out  the  superiority  of  the  latter,  but 
budgetary  and  time  constraints  kept  us  from  incorporating  it  into  LMSTEP  and 
GQTSTP . 

Table  I  gives  some  statistics  on  the  performance  of  LMSTEP  and  GQTSTP 
when  NL2S0L  is  run  on  the  problem  set  described  in  [DenGW79] .  (The 
statistics  were  gathered  on  the  Univac  1110  computer  at  the  University  of 
Wisconsin  using  a  preliminary  double-precision  version  of  NL2S0L  with 
V(VCONCR)  =  0  and  other  input  values  at  their  defaults,  with  the  exceptions 
described  in  §7  of  [DenGW79].)  The  column  headed  "No.  of  Steps"  gives  the 
total  number  of  non-Newton  steps  computed  and  the  column  headed  "%  of  Total" 
tells  what  percentage  these  were  of  all  the  steps  computed  by  the  module  in 
question.  The  average  value  of  k  when  Algorithm  4.1  halted  (averaged 
over  the  non-Newton  steps)  appears  in  the  column  labelled  "k  mean,"  and  the 

maximum  such  value  appears  under  "k  max".  In  the  columns  headed  "Special  Case"  are 

*  T 

the  percentage  of  non-Newton  steps  in  which  (near)  singularity  in  H  +  a  D  D 
was  detected  and  the  mean  and  maximum  final  values  of  k  for  these  steps. 

It  is  fortunate  that  the  mean  values  of  k  are  so  low.  Had  they 
been  much  greater  than  two  in  the  case  of  LMSTEP  or  four  in  the  case  of  GQTSTP, 
then  it  would  have  been  possible  to  save  some  time  in  these  modules  by  pre¬ 
processing  the  input  matrices  to  a  sparser  form:  by  reducing  J  to  bidiagonal 
form  in  LMSTEP  or  reducing  D  1HD  1  to  tridiagonal  form  in  GQTSTP  (see  §§7.5  and 
7.1  of  [Ste73]  and  the  references  cited  therein). 

After  all  the  effort  spent  in  studying  the  special  case,  it  is  some¬ 
what  disappointing  to  see  that  GQTSTP  never  detected  this  case.  It  will  be 
interesting  to  see  how  often  GQTSTP  detects  it  when  used  in  solving  other 
kinds  of  optimization  problems.  On  the  other  hand,  detecting  the  special 
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All  (Non-Newton)  OLC  Steps 
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